Appl. No. 09/939,954 

Amendment dated June 6, 2006 

Reply to Non-Final Office Action of January 6, 2006 


Amendments to Claims 

This listing of claims will replace all prior versions, and listings, of claims in the application: 
Listing of Claims 

Claim 1-18 (cancelled). 

Claim 19 (withdrawn): A method of representing an audio signal for machine learning 
comprising: 

(a) creating a perceptual representation of said audio signal by performing a 
frequency domain transform on at least one time-sampled window of a digital representation of 
said audio signal, said perceptual representation comprising component magnitudes of 
constituent frequency vectors that comprise said audio signal; 

(b) calculating a magnitude of each constituent frequency vector within said audio 

signal; 

(c) grouping each of said constituent frequency vectors into a number of frequency 

bands; 

(d) calculating an average magnitude of said constituent frequency vectors within 
each of said frequency bands; and 

(e) arranging said magnitudes into a learning representation. 

20 (withdrawn): The method according to claim 19 wherein said frequency domain 
transform is a Fast Fourier Transform. 

Claim 21 (withdrawn): The method according to claim 19 wherein an average magnitude 
of said constituent frequency vectors within each of said frequency bands further comprises an 
aggregate average magnitude over a plurality of said time-sampled windows. 
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Claim 22 (withdrawn): The method according to claim 21 where said plurality of time- 
sampled windows comprises 12 time-sampled windows. 

Claim 23 (withdrawn): The method according to claim 19 wherein no said frequency 
band includes any frequency greater than 1 1 kHz. 

Claim 24 (withdrawn): The method according to claim 19 wherein said frequency bands 
grow in size according to the golden ratio of frequency with respect to pitch. 

Claim 25 (withdrawn): The method according to claim 19 further comprising the step of 
converting said audio signal into a pulse code modulated bitstream for processing by said 
frequency domain transform. 

Claim 26 (withdrawn): A computer readable storage medium, storing therein a program 
of instructions for causing a computer to execute process of representing an audio signal for 
machine learning, said process comprising the steps of: 

(a) creating a perceptual representation of said audio signal by performing a 
frequency domain transform on at least one time-sampled window of a digital representation of 
said audio signal, said perceptual representation comprising component magnitudes of 
constituent frequency vectors that comprise said audio signal; 

(b) calculating a magnitude of each constituent frequency vector within said audio 

signal; 

(c) grouping each of said constituent frequency vectors into a number of frequency 

bands; 

(d) calculating an average magnitude of said constituent frequency vectors within 
each of said frequency bands; and 

(e) arranging said magnitudes into a learning representation. 


Claim 27-43 (cancelled). 
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Claim 44 (withdrawn): An apparatus for representing an audio signal for machine 
learning comprising: 

(a) a means for performing a frequency domain transform on at least one time- 
sampled window of a digital representation of said audio signal, said perceptual representation 
comprising component magnitudes of constituent frequency vectors that comprise said audio 
signal; 

(b) a means for calculating a magnitude of each constituent frequency vector; 

(c) a means for grouping each of said constituent frequency vectors into a number of 
frequency bands; 

(d) a means for calculating an average magnitude of said constituent frequency 
vectors within each of said frequency bands; and 

(e) a means for arranging said magnitudes into a learning representation. 

Claim 45 (withdrawn): The apparatus according to claim 44 wherein said means for 
performing a frequency domain transform comprises a means for performing a Fast Fourier 
Transform. 

Claim 46 (withdrawn): The apparatus according to claim 44 wherein no said frequency 
band includes any frequency greater than 1 1 kHz. 

Claim 47 (withdrawn): The apparatus according to claim 44 wherein said frequency 
bands grow in size according to the golden ratio of frequency with respect to pitch. 

Claim 48 (withdrawn): The apparatus according to claim 44 further comprising a means 
for converting said audio signal into a pulse code modulated bitstream for processing by said 
frequency domain transform. 
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Claim 49 (new): A method of extracting classifying data from an audio signal, the 
method comprising the steps of: 

processing a perceptual representation of the audio signal into a learning representation of 
the audio signal; and 

inputting the learning representation into a multi-stage classifier, the multi-stage classifier 
comprising a first stage of support vector machine classifiers and a final stage metalearner 
classifier, each support vector machine classifier trained to identify one out of a plurality of audio 
classification categories and where the support vector machine classifiers are used to generate a 
metalearner vector that allows the final stage metalearner classifier to classify the audio signal 
into one out of the plurality of audio classification categories. 

Claim 50 (new): The method of claim 49 wherein the final stage metalearner classifier is 
a neural network classifier. 

Claim 51 (new): The method of claim 49 wherein each support vector machine classifier 
outputs a value reflecting how closely the audio signal conforms to the one out of the plurality of 
audio classification categories, each value then used in the metalearner vector. 

Claim 52 (new): The method of claim 49 wherein said audio classification categories 
comprises classifications by musical artist. 

Claim 53 (new): The method of claim 49 wherein the learning representation comprises 
dividing the perceptual representation of the audio signal into a plurality of time slices. 

Claim 54 (new): The method of claim 49 wherein the learning representation comprises 
dividing the perceptual representation of the audio signal into a plurality of frequency bands. 

Claim 55 (new): A computer readable storage medium, storing therein a program of 
instructions for causing a computer to execute a process of extracting classifying data from an 
audio signal, the process comprising the steps of: 
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processing a perceptual representation of the audio signal into a learning representation of 
the audio signal; and 

inputting the learning representation into a multi-stage classifier, the multi-stage classifier 
comprising a first stage of support vector machine classifiers and a final stage metalearner 
classifier, each support vector machine classifier trained to identify one out of a plurality of audio 
classification categories and where the support vector machine classifiers are used to generate a 
metalearner vector that allows the final stage metalearner classifier to classify the audio signal 
into one out of the plurality of audio classification categories. 

Claim 56 (new): The computer readable storage medium of claim 55 wherein the final 
stage metalearner classifier is a neural network classifier. 

Claim 57 (new): The computer readable storage medium of claim 55 wherein each 
support vector machine classifier outputs a value reflecting how closely the audio signal 
conforms to the one out of the plurality of audio classification categories, each value then used in 
the metalearner vector. 

Claim 58 (new): The computer readable storage medium of claim 55 wherein said audio 
classification categories comprises classifications by musical artist. 

Claim 59 (new): The computer readable storage medium of claim 55 wherein the 
learning representation comprises dividing the perceptual representation of the audio signal into a 
plurality of time slices. 

Claim 60 (new): The computer readable storage medium of claim 55 wherein the 
learning representation comprises dividing the perceptual representation of the audio signal into a 
plurality of frequency bands. 

Claim 61 (new): An apparatus for classifying an audio signal comprising: 
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means for processing a perceptual representation of the audio signal into a learning 
representation of the audio signal; and 

a multi-stage classifier, the multi-stage classifier further comprising a first stage of 
support vector machine classifiers and a final stage metalearner classifier, each support vector 
machine classifier trained to identify one out of a plurality of audio classification categories from 
the learning representation of the audio signal and where the support vector machine classifiers 
are used to generate a metalearner vector that allows the final stage metalearner classifier to 
classify the audio signal into one out of the plurality of audio classification categories. 

Claim 62 (new): The apparatus of claim 61 wherein the final stage metalearner classifier 
is a neural network classifier. 

Claim 63 (new): The apparatus of claim 61 wherein each support vector machine 
classifier outputs a value reflecting how closely the audio signal conforms to the one out of the 
plurality of audio classification categories, each value then used in the metalearner vector. 

Claim 64 (new): The apparatus of claim 61 wherein said audio classification categories 
comprises classifications by musical artist. 

Claim 65 (new): The apparatus of claim 61 wherein the learning representation 
comprises dividing the perceptual representation of the audio signal into a plurality of time slices. 

Claim 66 (new): The apparatus of claim 61 wherein the learning representation 
comprises dividing the perceptual representation of the audio signal into a plurality of frequency 
bands. 
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